I have set echo=FALSE so that most of the code chunks will not display. Please refer to the .Rmd file for source code.

1 Loading packages and importing data

1.1 Packages

Details of packages, etc, are given below.

## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] magick_2.3      patchwork_1.0.0 ggthemes_4.2.0  see_0.2.1      
##  [5] latex2exp_0.4.0 ggridges_0.5.1  tidybayes_2.0.2 brms_2.12.0    
##  [9] Rcpp_1.0.3      forcats_0.4.0   stringr_1.4.0   dplyr_0.8.3    
## [13] purrr_0.3.3     readr_1.3.1     tidyr_1.0.2     tibble_2.1.3   
## [17] ggplot2_3.2.1   tidyverse_1.2.1 bookdown_0.18  
## 
## loaded via a namespace (and not attached):
##  [1] colorspace_1.4-1          rsconnect_0.8.15         
##  [3] markdown_1.1              base64enc_0.1-3          
##  [5] rstudioapi_0.10           rstan_2.19.2             
##  [7] svUnit_0.7-12             DT_0.8                   
##  [9] fansi_0.4.1               lubridate_1.7.4          
## [11] xml2_1.2.2                bridgesampling_0.7-2     
## [13] knitr_1.25                shinythemes_1.1.2        
## [15] bayesplot_1.7.0           jsonlite_1.6             
## [17] broom_0.5.2               shiny_1.3.2              
## [19] compiler_3.6.1            httr_1.4.1               
## [21] backports_1.1.5           assertthat_0.2.1         
## [23] Matrix_1.2-17             lazyeval_0.2.2           
## [25] cli_2.0.1                 later_0.8.0              
## [27] htmltools_0.3.6           prettyunits_1.1.1        
## [29] tools_3.6.1               igraph_1.2.4.1           
## [31] coda_0.19-3               gtable_0.3.0             
## [33] glue_1.3.1                reshape2_1.4.3           
## [35] cellranger_1.1.0          vctrs_0.2.2              
## [37] nlme_3.1-140              crosstalk_1.0.0          
## [39] insight_0.5.0             xfun_0.8                 
## [41] ps_1.3.0                  rvest_0.3.4              
## [43] mime_0.7                  miniUI_0.1.1.1           
## [45] lifecycle_0.1.0           gtools_3.8.1             
## [47] zoo_1.8-6                 scales_1.1.0             
## [49] colourpicker_1.0          hms_0.5.0                
## [51] promises_1.0.1            Brobdingnag_1.2-6        
## [53] parallel_3.6.1            inline_0.3.15            
## [55] shinystan_2.5.0           yaml_2.2.0               
## [57] gridExtra_2.3             loo_2.2.0                
## [59] StanHeaders_2.19.0        stringi_1.4.5            
## [61] bayestestR_0.3.0          dygraphs_1.1.1.6         
## [63] pkgbuild_1.0.6            rlang_0.4.4              
## [65] pkgconfig_2.0.3           matrixStats_0.55.0       
## [67] evaluate_0.14             lattice_0.20-38          
## [69] rstantools_2.0.0          htmlwidgets_1.3          
## [71] tidyselect_0.2.5          processx_3.4.1           
## [73] plyr_1.8.5                magrittr_1.5             
## [75] R6_2.4.1                  generics_0.0.2           
## [77] pillar_1.4.3              haven_2.1.1              
## [79] withr_2.1.2               xts_0.11-2               
## [81] abind_1.4-5               modelr_0.1.5             
## [83] crayon_1.3.4              arrayhelpers_1.0-20160527
## [85] rmarkdown_2.1             grid_3.6.1               
## [87] readxl_1.3.1              callr_3.4.1              
## [89] threejs_0.3.1             digest_0.6.23            
## [91] xtable_1.8-4              httpuv_1.5.1             
## [93] stats4_3.6.1              munsell_0.5.0            
## [95] shinyjs_1.0

1.2 Import Data

Here we import our data and make some summary plots.

1.2.1 EEG Data

Importing the primary EEG data set. This is odd-harmonic filtered data from region-of-interest consisting of six electrodes over occipital cortex.

RMS data from two of the wallpaper groups, P2 and P4M. Odd harmonics are shown in A and B, while even harmonic data are shown in C and D, and occipital and parietal regions of interest are shown in dark nad light gray, respectively. The two groups elicit very different response amplitudes for odd harmonics over occipital cortex, but for even harmonics those differences are much less pronounced.

Figure 1.1: RMS data from two of the wallpaper groups, P2 and P4M. Odd harmonics are shown in A and B, while even harmonic data are shown in C and D, and occipital and parietal regions of interest are shown in dark nad light gray, respectively. The two groups elicit very different response amplitudes for odd harmonics over occipital cortex, but for even harmonics those differences are much less pronounced.

at

## Observations: 400
## Variables: 3
## $ wg      <chr> "P2", "PM", "PG", "CM", "PMM", "PMG", "PGG", "CMM", "P4"…
## $ subject <chr> "s01", "s01", "s01", "s01", "s01", "s01", "s01", "s01", …
## $ rms     <dbl> 0.4013, 0.6555, 0.5547, 0.7635, 0.9185, 0.7285, 0.4320, …

It’s clearly skewed, and negative display duration are impossible, so will fit a glm with family = 'lognormal.

Data are the root-mean-squaured (rms) over the odd-harmonic filtered waveforms.

Figure 1.2: Data are the root-mean-squaured (rms) over the odd-harmonic filtered waveforms.

1.2.2 Threshold Data

Here we import that data and select the columns that we’re interested in. Threshold gives the required display duration (in seconds) for the two stimuli to allow for accurate discrimination.

## Observations: 186
## Variables: 3
## $ subject   <chr> "person10", "person10", "person10", "person10", "perso…
## $ wg        <chr> "CM", "CMM", "P2", "P3", "P31M", "P3M1", "P4", "P4G", …
## $ threshold <dbl> 0.74125, 0.20216, 0.47697, 0.35012, 0.24529, 0.19022, …

As above, a summary of the data.

Histogram of the display duration thresholds.

Figure 1.3: Histogram of the display duration thresholds.

Again, we have a skewed distribution, so will fit with family = 'lognormal'.

1.2.3 Control Data

In addition to the primary EEG data set, we are also importing two control data sets which are (a) even harmonic data from the same occipital electrodes, and (b) odd harmonic data from six parietal electrodes (see Figure 1.1 and the main paper).

2 Bayesian Analysis

Here are the details of the Bayesian multi-level modelling. Our general approach is:

2.1 Define Priors

In this section we will specify some priors. We then then use a prior-predictive check to assess whether the prior is reasonable or not (i.e., on the same order of magnitude as our measurements).

2.1.1 Fixed Effects

Our independent variable is a categorical factor with 16 levels. We will drop the intercept from our model and instead fit a coefficent for each factor level (\(y \sim x - 0\)). As our dependant variable will be log-transformed, we can use the priors below:

prior <- c(
  set_prior("normal(0,2)", class = "b"),    
  set_prior("cauchy(0,2)", class = "sigma"))

2.1.2 Group-level Effects

We will keep the default weakly informative priors for the group-level (‘random’) effects. From the brms documentation:

[…] restricted to be non-negative and, by default, have a half student-t prior with 3 degrees of freedom and a scale parameter that depends on the standard deviation of the response after applying the link function. Minimally, the scale parameter is 10. This prior is used (a) to be only very weakly informative in order to influence results as few as possible, while (b) providing at least some regularization to considerably improve convergence and sampling efficiency.

2.1.3 Prior Predictive Check

Now we can specify our Bayesian multi-level model and priors. Note that as we are using sample_prior = 'only', the model will not learn anything from our data.

m_prior <- brm(data = d_eeg, 
  rms ~ wg-1 + (1|subject),
  family = "lognormal", 
  prior = prior, 
  iter = n_iter ,
  sample_prior = 'only')

We can use this model to generate data.

## Observations: 320,000
## Variables: 2
## $ key   <chr> "P2", "P2", "P2", "P2", "P2", "P2", "P2", "P2", "P2", "P2"…
## $ value <dbl> 0.58997505, 2.38399527, 0.16118726, 12.75673347, 23.216127…
The density plot shows the distribution of the empirical data, while the blue line shows the 66% and 95% prediction intervals.

Figure 2.1: The density plot shows the distribution of the empirical data, while the blue line shows the 66% and 95% prediction intervals.

We can see that i) our priors are relatively weak as the predictions span several orders of magnitide, and ii) our empirical data falls within this range.

2.2 Compute Posterior

2.2.1 Fit Model to EEG Data

We will now fit the model to the data.

##  Family: lognormal 
##   Links: mu = identity; sigma = identity 
## Formula: rms ~ wg - 1 + (1 | subject) 
##    Data: d_eeg (Number of observations: 400) 
## Samples: 4 chains, each with iter = 10000; warmup = 5000; thin = 1;
##          total post-warmup samples = 20000
## 
## Group-Level Effects: 
## ~subject (Number of levels: 25) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.38      0.06     0.29     0.52 1.00     1233     2365
## 
## Population-Level Effects: 
##        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## wgCM      -0.49      0.09    -0.66    -0.31 1.00      695     1064
## wgCMM     -0.12      0.09    -0.29     0.06 1.00      689      982
## wgP2      -0.95      0.09    -1.12    -0.77 1.00      690      967
## wgP3      -0.82      0.09    -0.99    -0.64 1.00      701     1050
## wgP31M    -0.43      0.09    -0.60    -0.26 1.01      704     1054
## wgP3M1    -0.14      0.09    -0.31     0.04 1.00      694      969
## wgP4      -0.48      0.09    -0.65    -0.30 1.01      678      973
## wgP4G     -0.26      0.09    -0.42    -0.08 1.00      689      887
## wgP4M      0.15      0.09    -0.01     0.33 1.00      699     1209
## wgP6      -0.48      0.09    -0.65    -0.30 1.00      696     1020
## wgP6M     -0.05      0.09    -0.21     0.13 1.00      687     1003
## wgPG      -0.93      0.09    -1.10    -0.75 1.00      704     1064
## wgPGG     -0.72      0.09    -0.89    -0.55 1.01      698     1106
## wgPM      -0.59      0.09    -0.76    -0.41 1.00      703      982
## wgPMG     -0.26      0.09    -0.42    -0.08 1.00      691     1001
## wgPMM     -0.07      0.09    -0.24     0.11 1.00      689     1043
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.23      0.01     0.21     0.25 1.00     8411    11033
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

We will now have at what the model predicts for the average participant (i.e, ignoring the random intercepts). If anybody else would like to suggest over sensible ways to summarise this model, do let me know! I’ve been trying to use some of the functions from the tidybayes package, and they don’t (yet?) have much on random effects.

The density plot shows the distribution of the empirical data, while the blue line shows the 66% and 95% prediction intervals.

Figure 2.2: The density plot shows the distribution of the empirical data, while the blue line shows the 66% and 95% prediction intervals.

2.2.2 Fit Model to Psychophysical Data

We will now fit the model to the data.

##  Family: lognormal 
##   Links: mu = identity; sigma = identity 
## Formula: threshold ~ wg - 1 + (1 | subject) 
##    Data: d_dispthresh (Number of observations: 186) 
## Samples: 4 chains, each with iter = 10000; warmup = 5000; thin = 1;
##          total post-warmup samples = 20000
## 
## Group-Level Effects: 
## ~subject (Number of levels: 12) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.39      0.11     0.23     0.65 1.00     3320     6391
## 
## Population-Level Effects: 
##        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## wgCM      -0.22      0.17    -0.54     0.12 1.00     3204     5211
## wgCMM     -1.15      0.17    -1.47    -0.81 1.00     3193     4952
## wgP2      -0.29      0.17    -0.62     0.05 1.00     3098     4734
## wgP3      -0.69      0.17    -1.00    -0.35 1.00     3030     5036
## wgP31M    -1.18      0.16    -1.49    -0.85 1.00     3190     4791
## wgP3M1    -1.35      0.16    -1.66    -1.02 1.00     3103     4735
## wgP4      -0.89      0.17    -1.20    -0.55 1.00     3162     5310
## wgP4G     -1.22      0.17    -1.54    -0.88 1.00     3297     5490
## wgP4M     -1.29      0.17    -1.61    -0.95 1.00     3063     5266
## wgP6      -1.20      0.17    -1.53    -0.86 1.00     3168     4805
## wgP6M     -1.42      0.17    -1.74    -1.08 1.00     3269     4816
## wgPG       0.36      0.17     0.04     0.70 1.00     3343     5404
## wgPGG     -0.31      0.16    -0.62     0.03 1.00     3238     4975
## wgPM      -0.79      0.17    -1.11    -0.44 1.00     3190     5267
## wgPMG     -0.96      0.17    -1.28    -0.63 1.00     3201     5025
## wgPMM     -1.17      0.17    -1.49    -0.84 1.00     3209     5332
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.42      0.02     0.37     0.47 1.00    13509    13539
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
The density plot shows the distribution of the empirical data, while the blue line shows the 66% and 95% prediction intervals.

Figure 2.3: The density plot shows the distribution of the empirical data, while the blue line shows the 66% and 95% prediction intervals.

2.2.3 EEG Control Data

We will also fit models to the control data. As we can see from Figure 2.4, the group differences are much smaller.

## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#bulk-ess
The density plot shows the distribution of the empirical data, while the blue line shows the 66% and 95% prediction intervals.

Figure 2.4: The density plot shows the distribution of the empirical data, while the blue line shows the 66% and 95% prediction intervals.

3 Subgroup Comparisons

We will now compute the difference between sub- and super-groups.

3.1 Primary EEG Data

Finally, we calculate the probability that the RMS difference between subgroup and supergroup is larger than zero given the data. This information is then binned so we can colour in the posterior density plots.

Now we will use it!

Distributions of the difference in mean log(rms) between sub- and super-groups. The index of each relationship is indicated by the colour of the y-axis label. The fill of the density plots indicated the probability of the difference being greater than zero.

Figure 3.1: Distributions of the difference in mean log(rms) between sub- and super-groups. The index of each relationship is indicated by the colour of the y-axis label. The fill of the density plots indicated the probability of the difference being greater than zero.

3.2 Psychophysical Data

We can do the same for the display duration thresholds from our psychophysics experiment. Here we are looking for the opposite effect, namely that display larger are larger for subgroups than for supergroups (see main paper), so we calculate the probability that differences in duration are smaller than zero.

Distributions of the difference in mean log display duration threshold between sub- and super-groups. The index of each relationship is indicated by the colour of the y-axis label. The fill of the density plots indicated the probability of the difference being less than zero.

Figure 3.2: Distributions of the difference in mean log display duration threshold between sub- and super-groups. The index of each relationship is indicated by the colour of the y-axis label. The fill of the density plots indicated the probability of the difference being less than zero.

3.3 Control EEG Data

We will now do exactly the same with the control data (odd harmonic data from parietal electrodes and even harmonic data from occipital electrodes)

Distributions of the difference in mean log(rms) between sub- and super-groups. The index of each relationship is indicated by the colour of the y-axis label. The fill of the density plots indicated the probability of the difference being greater than zero.

Figure 3.3: Distributions of the difference in mean log(rms) between sub- and super-groups. The index of each relationship is indicated by the colour of the y-axis label. The fill of the density plots indicated the probability of the difference being greater than zero.

Distributions of the difference in mean log(rms) between sub- and super-groups. The index of each relationship is indicated by the colour of the y-axis label. The fill of the density plots indicated the probability of the difference being greater than zero.

Figure 3.4: Distributions of the difference in mean log(rms) between sub- and super-groups. The index of each relationship is indicated by the colour of the y-axis label. The fill of the density plots indicated the probability of the difference being greater than zero.

3.4 Summary

We can summarise the subgroup comparison plots above by plotting ROC curves for each of our four measurements (Figure 3.5).

This figure shows how many of our 64 comparisons are classed as having a greater-than-zero difference (less-than-zero for the display durations) for difference thresholds. between 0.5 an 1.0.

Figure 3.5: This figure shows how many of our 64 comparisons are classed as having a greater-than-zero difference (less-than-zero for the display durations) for difference thresholds. between 0.5 an 1.0.

## # A tibble: 1 x 5
##     eeg threshold occ_even par_odd     p
##   <dbl>     <dbl>    <dbl>   <dbl> <dbl>
## 1    56        49       32      22  0.95

4 Additional Analysis

4.1 Index and Normality

Subgroup relations can be classified by their index, and by whether they are normal or not. Here we investigate the extent to which these two variables can account for the variation between the subgroup relationships.

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: mean_value ~ as.numeric(as.character(index)) + normal 
##    Data: comp_summary$eeg (Number of observations: 64) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##                             Estimate Est.Error l-95% CI u-95% CI Rhat
## Intercept                       0.05      0.10    -0.13     0.24 1.00
## as.numericas.characterindex     0.04      0.01     0.02     0.07 1.00
## normal                          0.14      0.07    -0.00     0.28 1.00
##                             Bulk_ESS Tail_ESS
## Intercept                       2678     2491
## as.numericas.characterindex     2866     2786
## normal                          2895     2678
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.19      0.02     0.16     0.23 1.00     3894     2940
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: mean_value ~ as.numeric(as.character(index)) + normal 
##    Data: comp_summary$threshold (Number of observations: 64) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##                             Estimate Est.Error l-95% CI u-95% CI Rhat
## Intercept                      -0.20      0.16    -0.53     0.11 1.00
## as.numericas.characterindex    -0.05      0.03    -0.10     0.00 1.00
## normal                         -0.02      0.12    -0.25     0.22 1.00
##                             Bulk_ESS Tail_ESS
## Intercept                       3031     2569
## as.numericas.characterindex     3192     2835
## normal                          2980     2652
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.35      0.03     0.30     0.42 1.00     3536     2994
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

4.2 Correlation Between Primary EEG data and Psychophysical Thresholds

Finally, we will investigate whether there is a correlation between the our primary EEG measure (rms amplitude of odd harmonics over occipital cortex) and our display duration thresholds. As our two different measures come from different samples of participants, we are unable to do a direct comparison. However, we can use the results of the models discussed in Section 3 and check for a correlation between the predicted values of the two measures.

## Warning: There were 2 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
## http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
## Warning: Examine the pairs() plot to diagnose sampling problems
##     Estimate Est.Error       Q2.5     Q97.5
## R2 0.1612703 0.0736701 0.03275469 0.3141725

We can see that although the correlation is relatively weak, our confidence interval indicates that we can be reasonably positive that \(R^2>0\) (i.e, 95% credible interval is 0.026 - 0.302).

Scatter plot showing the correlation between our two measures. Each line is a sample from the posterior of a Bayesian linear regression.

Figure 4.1: Scatter plot showing the correlation between our two measures. Each line is a sample from the posterior of a Bayesian linear regression.